logo

1 Set up R

  • Go to Github and download the repository as a zip file.
  • Move it to your local drive.
  • In RStudio go to File -> New Project -> Existing Directory -> Downloaded Github repository.
  • Run getwd() in your R console and check if the path ends in “Intro_to_R_gis”.
  • Open a new script: File -> New File -> R Script - this is where you will write all of today’s code.
  • Working in an R project means that our file paths are all relative to the “Intro_to_R_GIS” folder.

1.1 Quick R basics

  • R is case sensitive - read_csv is not the same as read_CSV.
  • New objects are created using the <- notation, e.g new_object <- 2 * 5 .
  • To overwrite an object use <- and its current name, e.g. current_object <- st_union(current_object).
  • Function’s arguments have to be in () and they’re defined with a =, e.g. st_transform(x = lfb_sf, crs = 27700)
  • To see function’s documentation preceed its name with a ?, e.g ?st_as_sf.
  • Write in the code editor (script) and execute your code line by line.
  • Use Ctrl + Enter to execute the current line of code.

1.2 Install & load R libraries

If you have not installed the necessary packages run install.packages() and then load them using the library() function.

install.packages("sf", dependencies = TRUE, type = "win.binary")
install.packages("tmap", dependencies = TRUE, type = "win.binary")
install.packages("tidyverse", dependencies = TRUE, type = "win.binary")

library(sf) 
library(tmap)
library(tidyverse)

2 Aims

By the end of today you will:

  • Understand what spatial data and GIS are.
  • Know how to use the Open Geography Portal.
  • Be aware of map projections and Coordinate Reference Systems (CRS) and be able to modify them.
  • Be able to load spatial data into R using the sf library.
  • Be familiar with using GSS codes to join statistics to geographies.
  • Understand how spatial objects can be manipulated using R’s tidyverse.
  • Understand how to use spatial joins.
  • Know how to make static and interactive maps in tmap.
  • Be able to export your maps and shapefiles.

3 GIS & Spatial Data

GIS stands for Geographic Information System/Science - the System part refers to the software used for capturing, storing, and manipulating spatial data, while the Science definition is concerned with scientific principles behind spatial analysis, and developing new methods and approaches to extract insight from geographic data.

Spatial Data - any data set which has, or has the potential to have, location attached to it. This includes, but is not limited to, coordinates, addresses, and geography codes.

Spatial data types:

  • Vector - points, lines, and polygons used to represent physical and administrative features. Used for displaying data with well defined extent.
  • Raster - pixel based data, often derived from satellite imagery. Used for displaying continuous or fuzzy variables.

Common spatial data formats:

  • Shapefile (.shp)
  • GeoPackage (.gpkg) / Geodatabase (.gdb)
  • GeoJSON / TopoJSON
  • Well-known-text
  • GeoTiff

3.1 Map Projection and Coordinate Reference System

In cartography, a map projection is a way to flatten a globe’s surface into a plane in order to make a map. This requires a systematic transformation of the latitudes and longitudes of locations from the surface of the globe into locations on a plane. All projections of a sphere on a plane necessarily distort the surface in some way and to some extent. Depending on the purpose of the map, some distortions are acceptable and others are not; therefore, different map projections exist in order to preserve some properties of the sphere-like body at the expense of other properties. Every distinct map projection distorts in a distinct way, by definition.

Source: Wikipedia

Source

All you need to know for today is:

  • When working with GB data use the British National Grid (BNG).
  • BNG uses Eastings and Northings which are given as metres, offset from the origin point.
  • BNG’s EPSG code is 27700.

3.2 GIS and R

R is commonly used for statistical analysis and programming, however it also has a whole range of GIS tools. There is a long history of geospatial libraries being developed for R and an amazing community of researchers and programmers around it. In the last few years, working with spatial data became much easier in R, with the development of the sf package. sf keeps all spatial information for each observation in a geometry column which means that we can treat it like a normal data frame but also perform all types of spatial operations on the data.

Simple feature collection with 6 features and 2 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 543417.3 ymin: 183488.5 xmax: 551943.8 ymax: 191137.3
epsg (SRID):    NA
proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
     wd19cd         wd19nm                       geometry
1 E05000026          Abbey MULTIPOLYGON (((544338.3 18...
2 E05000027         Alibon MULTIPOLYGON (((549604.1 18...
3 E05000028      Becontree MULTIPOLYGON (((547563.4 18...
4 E05000029 Chadwell Heath MULTIPOLYGON (((548881 1910...
5 E05000030      Eastbrook MULTIPOLYGON (((551552.9 18...
6 E05000031       Eastbury MULTIPOLYGON (((547271.2 18...

3.3 Sources of spatial data

When working with most types of data you will most commonly need to link them to official administrative and census boundaries, as well as more general geographic data. Most national and local governments will have a geography portal where boundaries can be downloaded. At ONS boundaries, lookups, and documentation can be downloaded from the Open Geography Portal, while other geographic featues can be accessed from the Ordnance Survey.

Boundaries managed by the ONS come in several resolutions (more generalised boundaries sacrifice accuracy and precision for file size and processing speeds):

  • F - Full resolution
  • G - Generalised
  • U - Ultra generalised

and extents (extent determines where the land/water boundary is - extent of the realm includes areas of water):

  • C - Clipped to coastline
  • E - Extent of the realm

3.3.1 Exercise - Open Geography Portal

  • Go to the Open Geography Portal and download the 2019 Local Authority District BGC boundaries. Save them to data/shp in your project structure and unzip.
  • Open the folder and see how many different files there are.

4 London Fire Brigade Animal Rescue Data

Troughout this tutorial we will be using data from the London Fire Brigade - LFB Animal Rescue Data. It covers all incidents between 2009 and 2020 which included assistance to animals that may be trapped or in distress. The data is updated monthly and includes a range of variables for each incident including some location information (postcode, borough, ward),the date/time of the incidents, cost, and type of animal in trouble.

We want to visualise, and better understand how much money has been spent on animal related incidents between 2009 and 2020, and what the distribution is at the MSOA level of geography. To achieve this we will have to import spatial data, manipulate it and create summary statistics, and then plot it.

4.1 Loading spatial and non-spatial data

LFB data has been tidied up and saved as a Comma Separated Value file (.csv). We can use read_csv to open it in R.

4.1.1 Exercise - open LFB data

  • Create a new object called lfb by using read_csv(). Load data located in “data/csv/lfb_2009_2020.csv”.
  • Use glimpse() or head() to view lfb structure.

Solution

lfb <- read_csv("data/csv/lfb_2009_2020.csv")
head(lfb)

lfb is currently just a data frame. It has not got an explicit geometry column which links observations to their geographic location. It does however contain several columns which can be used to convert it into a spatial data format.

Ward_code column references the GSS codes of wards within which the observations fall. GSS codes can be used to join lfb data to boundaries from the Open Geography Portal. One issue with this particular column is that it does not indicate the currency of GSS codes. Wards are subject to frequent change, and as such it is best practice to be clear about the dates of any boundaries used by stating the exact code used, e.g. wd19cd. Because LFB data does not include this information we have no guarantee that the boundaries and GSS codes we join will match.

Fortunately we have also been provided with columns recording the easting, and northing of each incident. We can use those to convert lfb into an sf object. To achieve this we will use the st_as_sf() function which takes the following arguments:

new_object <- st_as_sf(x = input_data_frame, coords = c("x_coordinate_column", "y_coordinate_column"), crs = 27700)

4.1.2 Exercise - create spatial data

  • Create a new object called lfb_sf by converting lfb using the st_as_sf() function.
  • Use glimpse() or head() to view lfb_sf structure.

Solution

lfb_sf <- st_as_sf(x = lfb, coords = c("easting", "northing"), crs = 27700)
head(lfb_sf)
Simple feature collection with 6 features and 10 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 504650 ymin: 164950 xmax: 554650 ymax: 192350
epsg (SRID):    27700
proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs

st_as_sf() converted the easting and northing columns to simple feature geometries and created a new column called geometry which holds spatial information for each row. Now that lfb is a spatial object we can plot it using the tmap package. For now we will use the qtm() function which creates a quick map, using tmap's default settings. qtm() only needs to be supplied with a simple feature object and is very useful for quickly inspecting your data.

4.1.3 Exercise - quick static maps

  • Plot lfb_sf using the qtm() function.

Solution

qtm(lfb_sf)

We can also create interactive maps using the tmap, package by running tmap_mode("view") before executing qtm(). To reverse it and go back to static maps use tmap_mode("plot").

4.1.4 Exercise - quick interactive maps

  • Make an interactive map of lfb_sf using the qtm() function and tmap_mode("view").

Solution

tmap_mode("view")
qtm(lfb_sf)

4.2 Filtering by GSS code

It looks like some of the locations are located outside of London, however we are only interested in incidents within the Local Authority Districts making up Greater London. To remove all points outside of London we will have to first import the LAD boundaries which we downloaded from the Open Geography Portal and then use them to spatially filter lfb_sf data.

So far we have created our own sf objects by adding a geometry column. The LAD data set is already a spatial one and as such we can use the st_read() function from the sf package to import it. st_read is extremely versatile and able to import most spatial data formats into R. The only argument that needs to be supplied to st_read is the full path to the LAD boundaries

4.2.1 Exercise - loading shapefiles

  • Use st_read() to load the LAD boundaries you downloaded at the beginning of the tutorial, aslad_2019.
  • LAD path - data/shp/Local_Authority_Districts_December_2019_Boundaries_UK_BGC/ Local_Authority_Districts_December_2019_Boundaries_UK_BGC.shp
  • Make a static map of the object you have just created using qtm() and setting tmap_mode("plot").

Solution

lad_2019 <- st_read("data/shp/Local_Authority_Districts_December_2019_Boundaries_UK_BGC/Local_Authority_Districts_December_2019_Boundaries_UK_BGC.shp")
Reading layer `Local_Authority_Districts_December_2019_Boundaries_UK_BGC' from data source `D:\1_projects\Intro_to_geography\intro_to_gis\data\shp\Local_Authority_Districts_December_2019_Boundaries_UK_BGC\Local_Authority_Districts_December_2019_Boundaries_UK_BGC.shp' using driver `ESRI Shapefile'
Simple feature collection with 382 features and 10 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -116.1928 ymin: 5342.7 xmax: 655653.8 ymax: 1220302
epsg (SRID):    NA
proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
tmap_mode("plot")
tmap mode set to plotting
qtm(lad_2019)

LAD boundaries have loaded correctly but they currently cover all of the UK when all we need is London. Because simple feature objects are data frames with a geometry column attached, any operations that we would perform on a normal data frame can also be performed on an object of class sf. Here we will use the dplyr::filter and stringr::str_detect() from the the tidyverse package to only keep LADs whose GSS code starts with “E09”.

4.2.2 Exercise - filter spatial data by variable

  • Inspect lad_2019 using head() or glimpse(), and identify which column holds the GSS codes - it should end in “cd”.
  • Create a new object called london_lad. Use dplyr::filter alongside stringr::str_detect() to only keep observations which have a GSS code starting with “E09”.
  • Plot london_lad to see if the results look correct.

Solution

head(lad_2019)
Simple feature collection with 6 features and 10 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 344666.1 ymin: 378867 xmax: 478441.5 ymax: 537152
epsg (SRID):    NA
proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
  objectid   lad19cd              lad19nm lad19nmw  bng_e  bng_n      long      lat st_areasha st_lengths                       geometry
1        1 E06000001           Hartlepool     <NA> 447160 531474 -1.270189 54.67614   93770350   68481.65 MULTIPOLYGON (((447097 5371...
2        2 E06000002        Middlesbrough     <NA> 451141 516887 -1.210998 54.54468   53858124   42570.87 MULTIPOLYGON (((449862.8 52...
3        3 E06000003 Redcar and Cleveland     <NA> 464361 519597 -1.006086 54.56752  245140395   94686.62 MULTIPOLYGON (((455939.7 52...
4        4 E06000004     Stockton-on-Tees     <NA> 444940 518183 -1.306645 54.55691  204903681  118320.90 MULTIPOLYGON (((444126.1 52...
5        5 E06000005           Darlington     <NA> 428029 515648 -1.568356 54.53534  197485809  105777.87 MULTIPOLYGON (((423475.7 52...
6        6 E06000006               Halton     <NA> 354246 382146 -2.688538 53.33425   79096448   76349.03 MULTIPOLYGON (((358374.7 38...
london_lad <- filter(lad_2019, str_detect(lad19cd, "E09"))
qtm(london_lad)

Finally, for the next step, we only need the outer boundary of London - all the internal LAD boundaries have to be removed and only the outer edges kept. sf has a function exactly for this purpose called st_union(). It only takes one argument, which is the sf object we want to unionise.

4.2.3 Exercise - dissolve boundaries

  • Create a new object called london_boundary using the st_union function.
  • Plot it to check the results.

Solution

london_boundary <- st_union(london_lad)
qtm(london_boundary)

4.3 Spatial subsetting and CRS

In addition to subsetting by value, as we did with the LAD boundaries earlier, we can also subset observations by evaluating their spatial relationship with another data set. We can for example select all LADs which are fully within Wales, every Output Area intersected by a river, or all households outside of city boundaries. There are a number of different spatial relationships which can be tested and used to subset observations.

sf has an inbuilt function called st_filter() which we can use to spatially subset observations. The function takes several arguments:

  • x - sf data frame we want to subset - lfb_sf
  • y - sf object used to evaluate the spatial relationship - london_boundary

Before running any spatial operations on two spatial objects it is always worth checking if their coordinate reference systems (CRS) match. sf will throw an error if that’s not the case. Try it for yourself below.

4.3.1 Exercise - spatial subset part 1

  • Use st_filter() to spatially subset lfb_sf by testing its relationship with london_boundary.

Solution:

lfb_sf <- st_filter(x = lfb_sf, y = london_boundary)

You should have got an error here saying Error in !inherits(x, "sf") : st_crs(x) == st_crs(y) is not TRUE. It means that objects x and y have different CRS. We can see this for ourselves by running the st_crs() function, which returns the coordinate reference system of an object.

4.3.2 Exercise - check CRS

  • Run st_crs() on both and lfb_sf and london_boundary and compare the results.

Solution:

st_crs(lfb_sf)
Coordinate Reference System:
  EPSG: 27700 
  proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs"
st_crs(london_boundary)
Coordinate Reference System:
  No EPSG code
  proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs"

We can see that london_boundary has not got an EPSG code, and that the proj4string information is different. We can solve this problem by transforming london_boundary’s CRS to match that of lfb_sf, simply by using the correct EPSG code. To do so we will use the st_transform() function which takes two arguments:

  • x - sf object to be transformed
  • crs - EPSG code that we want to transform our data to - BNG is 27700.

4.3.3 Exercise - transform CRS

  • Run st_transform() to transform and overwrite london_boundary. Remember to set the correct CRS.
  • Run st_crs() on lfb_sf and newly transformed london_boundary and compare the results.

Solution:

london_boundary <- st_transform(london_boundary, crs = 27700)

st_crs(lfb_sf)
Coordinate Reference System:
  EPSG: 27700 
  proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs"
st_crs(london_boundary)
Coordinate Reference System:
  EPSG: 27700 
  proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs"

Now that the CRS are matching we should be able to spatially subset lfb_sf.

4.3.4 Exercise - spatial subset part 2

  • Use st_filter to spatially subset lfb_sf by testing its relationship with london_boundary. Overwrite lfb_sf with the subset data.
  • Plot it to check if the results are correct.

Solution:

lfb_sf <- st_filter(x = lfb_sf, y = london_boundary)
qtm(lfb_sf)

4.4 Spatial and non-spatial joins

Simple features data can be joined to other data sets in two ways. We can either use a traditional, SQL like join, based on a value shared across the data sets or, since we have a geometry column, on the spatial relationship between the data sets. This is known as a spatial join, where variables from one data set are joined to another one only on the basis of their spatial relationship. The most commonly used operation is known as a Point-in-Polygon join where data from a polygon is joined to the points within them.

In sf spatial joins are handled using the st_join() function with arguments:

  • x - sf object to which we are joining data (LHS in SQL)
  • y - sf object whose variables are being joined (RHS in SQL)

We will be joining the Middle Super Output Areas to LFB locations, which will then allow us to group and plot data at MSOA level.

4.4.1 Exercise - spatial joins

  • Read in data/shp/MSOA_2011_london/msoa_2011_ew_bgc.shp as msoa_london - use st_read()
  • Check if msoa_london’s CRS and that of lfb_sf match. Transform msoa_london if necessary.
  • Create a new object called lfb_msoa_sf by running st_join() between lfb_sf and msoa_london
  • Inspect your new object using head() or glimpse() to see what columns have been added.
msoa_london <- st_read("data/shp/MSOA_2011_london/msoa_2011_ew_bgc.shp")
Reading layer `msoa_2011_ew_bgc' from data source `D:\1_projects\Intro_to_geography\intro_to_gis\data\shp\MSOA_2011_london\msoa_2011_ew_bgc.shp' using driver `ESRI Shapefile'
Simple feature collection with 983 features and 2 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
epsg (SRID):    NA
proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs
st_crs(msoa_london)
Coordinate Reference System:
  No EPSG code
  proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"
st_crs(lfb_sf)
Coordinate Reference System:
  EPSG: 27700 
  proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs"
msoa_london <- st_transform(msoa_london, crs = 27700)

lfb_msoa_sf <- st_join(lfb_sf, msoa_london)

head(lfb_msoa_sf)
Simple feature collection with 6 features and 12 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 504650 ymin: 164950 xmax: 554650 ymax: 192350
epsg (SRID):    27700
proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs

Before we proceed let’s check if all points were succesfully joined. If that’s not the case, some observations will be NA.

4.4.2 Exercise - remove NA joins

  • Use filter() and is.na() on the msoa11cd variable of lfb_msoa_sf to check if any points did not join correctly.
  • Save those to a new object called lfb_msoa_sf_na
  • Create an interactive map of the points and see why they did not join correctly.
  • Remove all NA observations from lfb_msoa_sfand overwrite it - use !is.na to find the correct subset.

Solution

tmap_mode("view")
tmap mode set to interactive viewing
lfb_sf_na <- filter(lfb_msoa_sf, is.na(msoa11cd))
qtm(lfb_sf_na)


lfb_msoa_sf <- filter(lfb_msoa_sf, !is.na(msoa11cd))

Now that msoa11cd is attached to our observations we can create some summary statistics for each MSOA. As mentioned before, we can use standard tidyverse functions on sf objects. Here, we will use dplyr to calculate the total number of incidents and their cost, and then use a non spatial join to attach those results to MSOA boundaries. At this stage we no longer need the geometry column for each LFB incident as a) we’re not performing any spatial operations on our points, and b) the geometry column can slow down/interrupt the dplyr::group_by function which we will be using. To remove the geometry column we can use the st_drop_geometry() function directly in the dplyr pipe.

4.4.3 Exercise - MSOA summary statistics

  • The step requires you to be familiar with dplyr’s more advanced functions. If you are struggling with this step load data/gpkg/msoa_lfb.gpkg as msoa_lfb using st_read.
  • Use st_drop_geometry() on lfb_msoa_sf to remove geometry data.
  • Create summary statistics per MSOA - sum of cost_gbp as total_cost (use na.rm = TRUE), and the total number of incidents as n_cases. You will need to use group_by() and summarise()
  • Create a new column called cost_per_incident using mutate - total_cost divided by n_cases.
  • Join lfb_msoa_stats to msoa_london, using left_join() and create a new object msoa_lfb
lfb_msoa_stats <- lfb_msoa_sf %>% 
                  st_drop_geometry() %>% 
                  group_by(msoa11cd) %>% 
                  summarise(total_cost = sum(cost_gbp, na.rm=TRUE), n_cases = n()) %>% 
                  mutate(cost_per_incident = total_cost/n_cases)
                         

msoa_lfb <- left_join(msoa_london, lfb_msoa_stats)
msoa_lfb
Simple feature collection with 983 features and 5 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
epsg (SRID):    27700
proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
First 10 features:
    msoa11cd                 msoa11nm total_cost n_cases cost_per_incident                       geometry
1  E02000001       City of London 001       3750      12          312.5000 MULTIPOLYGON (((531667.6 18...
2  E02000002 Barking and Dagenham 001       2221       7          317.2857 MULTIPOLYGON (((548881.6 19...
3  E02000003 Barking and Dagenham 002       4779      11          434.4545 MULTIPOLYGON (((549102.4 18...
4  E02000004 Barking and Dagenham 003       2917       7          416.7143 MULTIPOLYGON (((551550 1873...
5  E02000005 Barking and Dagenham 004       1113       4          278.2500 MULTIPOLYGON (((549099.6 18...
6  E02000007 Barking and Dagenham 006       2756       9          306.2222 MULTIPOLYGON (((549819.9 18...
7  E02000008 Barking and Dagenham 007       3205      10          320.5000 MULTIPOLYGON (((548171.4 18...
8  E02000009 Barking and Dagenham 008       2067       7          295.2857 MULTIPOLYGON (((546855 1863...
9  E02000010 Barking and Dagenham 009        859       3          286.3333 MULTIPOLYGON (((549618.8 18...
10 E02000011 Barking and Dagenham 010        550       2          275.0000 MULTIPOLYGON (((550244.1 18...
msoa_lfb <- st_read("data/gpkg/msoa_lfb.gpkg") 

At this stage it is a good idea to save our data. We can do this using the st_write() function. It needs an sf object and the path and name of the output.

4.4.4 Exercise - save data to gpkg

  • Copy and execute the following code to save your data: st_write(msoa_lfb,"output/msoa_lfb.gpkg)

5 Making better maps

Now that we have processed our data we can start mapping it. So far we have only used the qtm() function from the tmap package. This creates a default map and is great when all we want to do is quickly visualise our data. The full range of tmap functions gives us control over all elements of the final plot and allows us to create high quality maps.

tmap_mode("plot")

tm_shape(msoa_lfb) + 
  tm_polygons(col = "total_cost", border.col = "#4a4949", lwd = 0.05, title = "Total cost (£)", palette = "Blues", contrast = 1, legend.hist = TRUE,
        labels = c("0 - 2,000", ">2,000 - 4,000", ">4,000 - 6,000", ">6,000 - 8,000", ">8,000 - 10,000", 
                   ">10,000 - 12,000", ">12,000 - 14,000")) +
  tm_scale_bar(position = c(0,0), text.size = 0.7) +
  tm_layout(main.title = "Cost of animal related incidents per MSOA, between 2009 and 2020",  main.title.position =  c(0,0), main.title.size = 1, main.title.fontface = "bold", frame = FALSE, legend.position  = c(0.08,0.18),
            inner.margins = c(0.1,0.05,0.1,0.02), legend.outside = TRUE, legend.title.size  = 1, legend.text.size =  0.7, title.snap.to.legend = FALSE) +
  tm_shape(london_boundary) + tm_borders(col = "black", lwd = 0.25)

tmap follows similar principles to ggplot2, where we first specify the data source - tm_shape, then the aesthetics of the plot - tm_polygons, tm_dots, etc., and then we make any finaly adjustments - tm_layout. All functions need to be connected using the + symbol.

  • tm_shape() - sf object which you want to plot
  • tm_fill(), tm_borders(), tm_polygons(), tm_dots() - types of output
  • tm_layout() - controls layout of the map, titles, labels, etc.

tmap syntax: tm_shape(sf_object) + tm_borders(col = either "colour" or name of column which we want to plot) + tm_layout(main.title = "title of your map")

5.0.1 Guided exercise - mapping

Start by specifying which sf object is being mapped in tm_shape() and what column holds the values to be visualised. We will also change the legend’s title.

tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)")

Now let’s add london_boundary to have a thicker line around London.

tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black")

Next we will add a scale bar and position it in the bottom left corner.

tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0))

We can now remove the black frame from the map and add a title to our map.

tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE)

All of the map elements are now visible but they’re not in the right place. We can solve this by increasing the margins around our map. This will allow the title and the legend to move outwards.

tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))

We can also manually change the legend labels to ensure there are no overlapping values.

tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)",
              labels = c("0 - 200", ">200 - 400", ">400 - 600", ">600 - 800", ">800 - 1,000", 
                   ">1,000 - 1,200")) + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))

Finally let’s change the colour of our map and increase the contrast. Choose a colour from R Colours.

tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)",
              labels = c("0 - 200", ">200 - 400", ">400 - 600", ">600 - 800", ">800 - 1,000", 
                   ">1,000 - 1,200"), palette = "Blues", contrast = 1) + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))

Finally, save your map as an R object and export it.

average_cost <- tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)", palette = "Blues", contrast = 1) + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))
tmap_save(average_cost, "output/maps/average_cost_msoa.png", width = 8, height = 5)

You can also view your choropleth as an interactive map. It helps to add an alpha argument to change your map’s transparency.

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)", palette = "Blues", contrast = 1, alpha = 0.5) + 
  tm_shape(london_boundary) + tm_borders(col = "black") 
---
title: "Introduction to GIS in R"
author: "Robert Kaleta"
date: "February 2020"
output:
  html_notebook:
    number_sections: yes
    theme: flatly
    toc: yes
    toc_depth: 3
    toc_float: yes
#  html_document:
#    df_print: paged
#    toc: yes
#    toc_depth: '3'
---

```{r, echo=FALSE}
htmltools::img(src = knitr::image_uri("D:/3_ONS_DOCUMENTS/geospatial_logos/small geo icon.png"), 
               alt = 'logo', 
               style = 'position:absolute; top:50px; right:0; padding:10px;')
```

```{r, echo=FALSE}
knitr::opts_chunk$set(fig.align="center")
```

```{r libraries, message=FALSE, warning=FALSE, echo=FALSE}
library("sf")
library("tidyverse")
library("tmap")
library("knitr")
```

# Set up R

* Go to [Github](https://github.com/ONSGeospatial/Introduction_GIS_R) and download the repository as a zip file. 
* Move it to your local drive.
* In RStudio go to File -> New Project -> Existing Directory -> Downloaded Github repository.
* Run `getwd()` in your R console and check if the path ends in "Intro_to_R_gis".
* Open a new script: File -> New File -> R Script - this is where you will write all of today's code.
* Working in an R project means that our file paths are all relative to the "Intro_to_R_GIS" folder.

## Quick R basics

* R is case sensitive - `read_csv` is not the same as `read_CSV`.
* New objects are created using the `<-` notation, e.g `new_object <- 2 * 5` .
* To overwrite an object use `<-` and its current name, e.g. `current_object <- st_union(current_object)`.
* Function's arguments have to be in `()` and they're defined with a `=`, e.g. `st_transform(x = lfb_sf, crs = 27700)`
* To see function's documentation preceed its name with a `?`, e.g `?st_as_sf`.
* Write in the code editor (script) and execute your code line by line.
* Use `Ctrl + Enter` to execute the current line of code.

## Install & load R libraries

If you have not installed the necessary packages run `install.packages()` and then load them using the `library()` function.
```{r, eval=FALSE}
install.packages("sf", dependencies = TRUE, type = "win.binary")
install.packages("tmap", dependencies = TRUE, type = "win.binary")
install.packages("tidyverse", dependencies = TRUE, type = "win.binary")

library(sf) 
library(tmap)
library(tidyverse)
```


# Aims

**By the end of today you will: **
 
* Understand what spatial data and GIS are.
* Know how to use the Open Geography Portal.
* Be aware of map projections and Coordinate Reference Systems (CRS) and be able to modify them.
* Be able to load spatial data into R using the `sf` library.
* Be familiar with using GSS codes to join statistics to geographies.
* Understand how spatial objects can be manipulated using R's `tidyverse`.
* Understand how to use spatial joins. 
* Know how to make static and interactive maps in `tmap`.
* Be able to export your maps and shapefiles.


# GIS & Spatial Data

**GIS** stands for Geographic Information System/Science - the System part refers to the software used for capturing, storing, and manipulating spatial data, while the Science definition is concerned with scientific principles behind spatial analysis, and developing new methods and approaches to extract insight from geographic data. 

<center>
![](data/img/gis_logos.png)
</center>

**Spatial Data** - any data set which has, or has the potential to have, location attached to it. This includes, but is not limited to, coordinates, addresses, and geography codes.

**Spatial data types:**

* Vector - points, lines, and polygons used to represent physical and administrative features. Used for displaying data with well defined extent.

<center>
![](data/img/vector_example.png)
</center>

* Raster - pixel based data, often derived from satellite imagery. Used for displaying continuous or fuzzy variables. 

<center>
![](data/img/raster_example_small.png)
</center>




**Common spatial data formats:**

* Shapefile (.shp) 
* GeoPackage (.gpkg) / Geodatabase (.gdb)
* GeoJSON / TopoJSON
* Well-known-text
* GeoTiff

## Map Projection and Coordinate Reference System

*In cartography, a map projection is a way to flatten a globe's surface into a plane in order to make a map. This requires a systematic transformation of the latitudes and longitudes of locations from the surface of the globe into locations on a plane. All projections of a sphere on a plane necessarily distort the surface in some way and to some extent. Depending on the purpose of the map, some distortions are acceptable and others are not; therefore, different map projections exist in order to preserve some properties of the sphere-like body at the expense of other properties. Every distinct map projection distorts in a distinct way, by definition.*

<font size="2">Source: Wikipedia </font>

<center>
![](data/img/map_projection.png)
</center>
<font size="2">[Source](https://www.researchgate.net/profile/Bojan_Savric2/publication/298354278/figure/fig1/AS:614354953199631@1523485043473/The-nine-small---scale-map-projections-used-in-the-paired-comparison-test-arranged-by.png)</font>

**All you need to know for today is:**

* When working with GB data use the **British National Grid (BNG)**.
* BNG uses **Eastings** and **Northings** which are given as metres, offset from the origin point.
* BNG's EPSG code is **27700**.



<center>
![](data/img/bng.png)

</center>


## GIS and R

R is commonly used for statistical analysis and programming, however it also has a whole range of GIS tools. There is a long history of geospatial libraries being developed for R and an amazing community of researchers and programmers around it. In the last few years, working with spatial data became much easier in R, with the development of the `sf` package. `sf` keeps all spatial information for each observation in a geometry column which means that we can treat it like a normal data frame but also perform all types of spatial operations on the data.  

```{r echo=FALSE}
wd_2019_bgc <- st_read("data/shp/Wards_December_2019_Boundaries_EW_BGC/Wards_December_2019_Boundaries_EW_BGC.shp", quiet = TRUE) %>% 
  select(wd19cd, wd19nm)
head(wd_2019_bgc)
```


## Sources of spatial data

When working with most types of data you will most commonly need to link them to official administrative and census boundaries, as well as more general geographic data. Most national and local governments will have a geography portal where boundaries can be downloaded. At ONS boundaries, lookups, and documentation can be downloaded from the [Open Geography Portal](http://geoportal.statistics.gov.uk/), while other geographic featues can be accessed from the [Ordnance Survey](http://ordnancesurvey.co.uk).  

<center>
![](data/img/geo_portal.png)
</center>  



Boundaries managed by the ONS come in several resolutions (more generalised boundaries sacrifice accuracy and precision for file size and processing speeds):

* F - Full resolution
* G - Generalised
* U - Ultra generalised

and extents (extent determines where the land/water boundary is - extent of the realm includes areas of water):


* C - Clipped to coastline
* E - Extent of the realm


<center>
![](data/img/bgc_small.png)
</center>


### Exercise - Open Geography Portal

* Go to the [Open Geography Portal](http://geoportal.statistics.gov.uk/) and download the 2019 Local Authority District BGC boundaries. Save them to `data/shp` in your project structure and unzip. 
* Open the folder and see how many different files there are.


# London Fire Brigade Animal Rescue Data

Troughout this tutorial we will be using data from the London Fire Brigade - [LFB Animal Rescue Data](https://data.london.gov.uk/dataset/animal-rescue-incidents-attended-by-lfb). It covers all incidents between 2009 and 2020 which included assistance to animals that may be trapped or in distress. The data is updated monthly and includes a range of variables for each incident including some location information (postcode, borough, ward),the date/time of the incidents, cost, and type of animal in trouble. 

We want to visualise, and better understand how much money has been spent on animal related incidents between 2009 and 2020, and what the distribution is at the MSOA level of geography. To achieve this we will have to import spatial data, manipulate it and create summary statistics, and then plot it.

## Loading spatial and non-spatial data

LFB data has been tidied up and saved as a Comma Separated Value file (.csv). We can use `read_csv` to open it in R.

### Exercise - open LFB data

* Create a new object called `lfb` by using `read_csv()`. Load data located in "data/csv/lfb_2009_2020.csv".
* Use `glimpse()` or `head()` to view `lfb` structure.

**Solution**
```{r message=FALSE}
lfb <- read_csv("data/csv/lfb_2009_2020.csv")
head(lfb)
```

`lfb` is currently just a data frame. It has not got an explicit geometry column which links observations to their geographic location. It does however contain several columns which can be used to convert it into a spatial data format.   

**Ward_code** column references the GSS codes of wards within which the observations fall. GSS codes can be used to join `lfb` data to boundaries from the Open Geography Portal. One issue with this particular column is that it does not indicate the currency of GSS codes. Wards are subject to frequent change, and as such it is best practice to be clear about the dates of any boundaries used by stating the exact code used, e.g. **wd19cd**. Because LFB data does not include this information we have no guarantee that the boundaries and GSS codes we join will match.  

Fortunately we have also been provided with columns recording the **easting**, and **northing** of each incident. We can use those to convert `lfb` into an `sf` object. To achieve this we will use the `st_as_sf()` function which takes the following arguments:

`new_object <- st_as_sf(x = input_data_frame, coords = c("x_coordinate_column", "y_coordinate_column"), crs = 27700)`

### Exercise - create spatial data

* Create a new object called `lfb_sf` by converting `lfb` using the `st_as_sf()` function.
* Use `glimpse()` or `head()` to view `lfb_sf` structure.

**Solution**
```{r}
lfb_sf <- st_as_sf(x = lfb, coords = c("easting", "northing"), crs = 27700)
head(lfb_sf)
```

`st_as_sf()` converted the **easting** and **northing** columns to simple feature geometries and created a new column called **geometry** which holds spatial information for each row. 
Now that `lfb` is a spatial object we can plot it using the `tmap` package. For now we will use the `qtm()` function which creates a quick map, using `tmap's` default settings. `qtm()` only needs to be supplied with a simple feature object and is very useful for quickly inspecting your data. 

### Exercise - quick static maps

* Plot lfb_sf using the `qtm()` function.

**Solution**
```{r}
qtm(lfb_sf)
```


We can also create interactive maps using the `tmap`, package by running `tmap_mode("view")` before executing `qtm()`. 
To reverse it and go back to static maps use `tmap_mode("plot")`.

### Exercise - quick interactive maps

* Make an interactive map of `lfb_sf` using the `qtm()` function and `tmap_mode("view")`.

**Solution**
```{r, message=FALSE}
tmap_mode("view")
qtm(lfb_sf)
```

## Filtering by GSS code

It looks like some of the locations are located outside of London, however we are only interested in incidents within the Local Authority Districts making up Greater London. To remove all points outside of London we will have to first import the LAD boundaries which we downloaded from the Open Geography Portal and then use them to spatially filter `lfb_sf` data. 

So far we have created our own `sf` objects by adding a geometry column. The LAD data set is already a spatial one and as such we can use the `st_read()` function from the `sf` package to import it. `st_read` is extremely versatile and able to import most spatial data formats into R. The only argument that needs to be supplied to `st_read` is the full path to the LAD boundaries 

### Exercise - loading shapefiles

* Use `st_read()` to load the LAD boundaries you downloaded at the beginning of the tutorial, as`lad_2019`.
* LAD path - `data/shp/Local_Authority_Districts_December_2019_Boundaries_UK_BGC/`
              `Local_Authority_Districts_December_2019_Boundaries_UK_BGC.shp`
* Make a static map of the object you have just created using `qtm()` and setting `tmap_mode("plot")`.

**Solution**
```{r}
lad_2019 <- st_read("data/shp/Local_Authority_Districts_December_2019_Boundaries_UK_BGC/Local_Authority_Districts_December_2019_Boundaries_UK_BGC.shp")
tmap_mode("plot")
qtm(lad_2019)

```

LAD boundaries have loaded correctly but they currently cover all of the UK when all we need is London. Because simple feature objects are data frames with a geometry column attached, any operations that we would perform on a normal data frame can also be performed on an object of class `sf`. Here we will use the `dplyr::filter` and `stringr::str_detect()` from the the `tidyverse` package to only keep LADs whose GSS code starts with "E09".


### Exercise - filter spatial data by variable

* Inspect lad_2019 using `head()` or `glimpse()`, and identify which column holds the GSS codes - it should end in "cd".
* Create a new object called `london_lad`. Use `dplyr::filter` alongside `stringr::str_detect()` to only keep observations which have a GSS code starting with "E09".
* Plot `london_lad` to see if the results look correct.

**Solution**
```{r}
head(lad_2019)
```

```{r}
london_lad <- filter(lad_2019, str_detect(lad19cd, "E09"))
qtm(london_lad)
```

Finally, for the next step, we only need the outer boundary of London - all the internal LAD boundaries have to be removed and only the outer edges kept. `sf` has a function exactly for this purpose called `st_union()`. 
It only takes one argument, which is the `sf` object we want to unionise. 

### Exercise - dissolve boundaries

* Create a new object called `london_boundary` using the `st_union` function.
* Plot it to check the results.

**Solution**
```{r}
london_boundary <- st_union(london_lad)
qtm(london_boundary)
```


## Spatial subsetting and CRS

In addition to subsetting by value, as we did with the LAD boundaries earlier, we can also subset observations by evaluating their spatial relationship with another data set. We can for example select all LADs which are fully within Wales, every Output Area intersected by a river, or all households outside of city boundaries. There are a number of different spatial relationships which can be tested and used to subset observations.

<center>
![](data/img/spatial_relation.png) 
</center>


`sf` has an inbuilt function called `st_filter()` which we can use to spatially subset observations.
The function takes several arguments:

* x - `sf` data frame we want to subset - `lfb_sf`
* y - `sf` object used to evaluate the spatial relationship - `london_boundary`

Before running any spatial operations on two spatial objects it is always worth checking if their coordinate reference systems (CRS) match. `sf` will throw an error if that's not the case. Try it for yourself below.

### Exercise - spatial subset part 1 

* Use `st_filter()` to spatially subset `lfb_sf` by testing its relationship with `london_boundary`.

**Solution:**
```{r, eval=FALSE}
lfb_sf <- st_filter(x = lfb_sf, y = london_boundary)

```

You should have got an error here saying `Error in !inherits(x, "sf") : st_crs(x) == st_crs(y) is not TRUE`. It means that objects x and y have different CRS. We can see this for ourselves by running the `st_crs()` function, which returns the coordinate reference system of an object.

### Exercise  - check CRS

* Run `st_crs()` on both and `lfb_sf` and `london_boundary` and compare the results.

**Solution:**
```{r}
st_crs(lfb_sf)

st_crs(london_boundary)

```

We can see that `london_boundary` has not got an EPSG code, and that the proj4string information is different. We can  solve this problem by transforming `london_boundary`'s CRS to match that of `lfb_sf`, simply by using the correct EPSG code. To do so we will use the `st_transform()` function which takes two arguments: 

* x - `sf` object to be transformed
* crs - EPSG code that we want to transform our data to - BNG is 27700.

### Exercise - transform CRS

* Run `st_transform()` to transform and overwrite `london_boundary`. Remember to set the correct CRS. 
* Run `st_crs()` on `lfb_sf` and newly transformed `london_boundary` and compare the results.

**Solution:**

```{r}
london_boundary <- st_transform(london_boundary, crs = 27700)

st_crs(lfb_sf)
st_crs(london_boundary)
```


Now that the CRS are matching we should be able to spatially subset `lfb_sf`.

### Exercise - spatial subset part 2 

* Use `st_filter` to spatially subset `lfb_sf` by testing its relationship with `london_boundary`. Overwrite `lfb_sf` with the subset data.
* Plot it to check if the results are correct.

**Solution:**
```{r}
lfb_sf <- st_filter(x = lfb_sf, y = london_boundary)
qtm(lfb_sf)
```


## Spatial and non-spatial joins

Simple features data can be joined to other data sets in two ways. We can either use a traditional, SQL like join, based on a value shared across the data sets or, since we have a geometry column, on the spatial relationship between the data sets. This is known as a spatial join, where variables from one data set are joined to another one only on the basis of their spatial relationship. The most commonly used operation is known as a Point-in-Polygon join where data from a polygon is joined to the points within them.

<center>
![](data/img/spatial_join.png)
</center>  
  
  

In `sf` spatial joins are handled using the `st_join()` function with arguments:

* x - `sf` object to which we are joining data (LHS in SQL)
* y - `sf` object whose variables are being joined (RHS in SQL)

We will be joining the Middle Super Output Areas to LFB locations, which will then allow us to group and plot data at MSOA level.

### Exercise - spatial joins

* Read in `data/shp/MSOA_2011_london/msoa_2011_ew_bgc.shp` as `msoa_london` - use `st_read()`
* Check if `msoa_london`'s CRS and that of `lfb_sf` match. Transform `msoa_london` if necessary.
* Create a new object called `lfb_msoa_sf` by running `st_join()` between `lfb_sf` and `msoa_london`
* Inspect your new object using `head()` or `glimpse()` to see what columns have been added.

```{r}
msoa_london <- st_read("data/shp/MSOA_2011_london/msoa_2011_ew_bgc.shp")

st_crs(msoa_london)

st_crs(lfb_sf)

msoa_london <- st_transform(msoa_london, crs = 27700)

lfb_msoa_sf <- st_join(lfb_sf, msoa_london)

head(lfb_msoa_sf)
```


Before we proceed let's check if all points were succesfully joined. If that's not the case, some observations will be `NA`. 

### Exercise - remove NA joins

* Use `filter()` and `is.na()` on the `msoa11cd` variable of `lfb_msoa_sf` to check if any points did not join correctly. 
* Save those to a new object called `lfb_msoa_sf_na` 
* Create an interactive map of the points and see why they did not join correctly.
* Remove all `NA` observations from `lfb_msoa_sf`and overwrite it - use `!is.na` to find the correct subset.

Solution
```{r}
tmap_mode("view")
lfb_sf_na <- filter(lfb_msoa_sf, is.na(msoa11cd))
qtm(lfb_sf_na)

lfb_msoa_sf <- filter(lfb_msoa_sf, !is.na(msoa11cd))
```


Now that `msoa11cd` is attached to our observations we can create some summary statistics for each MSOA. As mentioned before, we can use standard `tidyverse` functions on `sf` objects. Here, we will use `dplyr` to calculate the total number of incidents and their cost, and then use a non spatial join to attach those results to MSOA boundaries. At this stage we no longer need the geometry column for each LFB incident as a) we're not performing any spatial operations on our points, and b) the geometry column can slow down/interrupt the `dplyr::group_by` function which we will be using. To remove the geometry column we can use the `st_drop_geometry()` function directly in the dplyr pipe. 

### Exercise - MSOA summary statistics

* The step requires you to be familiar with `dplyr`'s more advanced functions. If you are struggling with this step load `data/gpkg/msoa_lfb.gpkg` as `msoa_lfb` using `st_read`.
* Use `st_drop_geometry()` on `lfb_msoa_sf` to remove geometry data. 
* Create summary statistics per MSOA - sum of cost_gbp as total_cost (use `na.rm = TRUE`), and the total number of incidents as n_cases. You will need to use `group_by()` and `summarise()`
* Create a new column called `cost_per_incident` using `mutate` - `total_cost` divided by `n_cases`.
* Join `lfb_msoa_stats` to `msoa_london`, using `left_join()` and create a new object `msoa_lfb`

```{r, message=FALSE}
lfb_msoa_stats <- lfb_msoa_sf %>% 
                  st_drop_geometry() %>% 
                  group_by(msoa11cd) %>% 
                  summarise(total_cost = sum(cost_gbp, na.rm=TRUE), n_cases = n()) %>% 
                  mutate(cost_per_incident = total_cost/n_cases)
                         

msoa_lfb <- left_join(msoa_london, lfb_msoa_stats)
msoa_lfb
```

```{r, eval=FALSE}
msoa_lfb <- st_read("data/gpkg/msoa_lfb.gpkg") 

```

At this stage it is a good idea to save our data. We can do this using the `st_write()` function. It needs an `sf` object and the path and name of the output.

### Exercise - save data to gpkg

* Copy and execute the following code to save your data: `st_write(msoa_lfb,"output/msoa_lfb.gpkg)`


# Making better maps

Now that we have processed our data we can start mapping it. So far we have only used the `qtm()` function from the `tmap` package. This creates a default map and is great when all we want to do is quickly visualise our data. The full range of `tmap` functions gives us control over all elements of the final plot and allows us to create high quality maps.

```{r, fig.align='center', message=FALSE}
tmap_mode("plot")

tm_shape(msoa_lfb) + 
  tm_polygons(col = "total_cost", border.col = "#4a4949", lwd = 0.05, title = "Total cost (£)", palette = "Blues", contrast = 1, legend.hist = TRUE,
        labels = c("0 - 2,000", ">2,000 - 4,000", ">4,000 - 6,000", ">6,000 - 8,000", ">8,000 - 10,000", 
                   ">10,000 - 12,000", ">12,000 - 14,000")) +
  tm_scale_bar(position = c(0,0), text.size = 0.7) +
  tm_layout(main.title = "Cost of animal related incidents per MSOA, between 2009 and 2020",  main.title.position =  c(0,0), main.title.size = 1, main.title.fontface = "bold", frame = FALSE, legend.position  = c(0.08,0.18),
            inner.margins = c(0.1,0.05,0.1,0.02), legend.outside = TRUE, legend.title.size  = 1, legend.text.size =  0.7, title.snap.to.legend = FALSE) +
  tm_shape(london_boundary) + tm_borders(col = "black", lwd = 0.25)
```


`tmap` follows similar principles to `ggplot2`, where we first specify the data source - `tm_shape`, then the aesthetics of the plot - `tm_polygons`, `tm_dots`, etc., and then we make any finaly adjustments - `tm_layout`. All functions need to be connected using the `+` symbol.

* `tm_shape()` - `sf` object which you want to plot
* `tm_fill()`, `tm_borders()`, `tm_polygons()`, `tm_dots()` - types of output
* `tm_layout()` - controls layout of the map, titles, labels, etc.

`tmap` syntax: `tm_shape(sf_object) + tm_borders(col = either "colour" or name of column which we want to plot) + tm_layout(main.title = "title of your map")`

### Guided exercise - mapping

Start by specifying which `sf` object is being mapped in `tm_shape()` and what column holds the values to be visualised. We will also change the legend's title.
```{r}
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)")
```

Now let's add `london_boundary` to have a thicker line around London.
```{r}
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black")
```

Next we will add a scale bar and position it in the bottom left corner.
```{r}
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0))
```

We can now remove the black frame from the map and add a title to our map.
```{r}
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE)
```

All of the map elements are now visible but they're not in the right place. We can solve this by increasing the margins around our map. This will allow the title and the legend to move outwards.

```{r}
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)") + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))

```

We can also manually change the legend labels to ensure there are no overlapping values.
```{r}
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)",
              labels = c("0 - 200", ">200 - 400", ">400 - 600", ">600 - 800", ">800 - 1,000", 
                   ">1,000 - 1,200")) + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))
```


Finally let's change the colour of our map and increase the contrast. Choose a colour from [R Colours](https://www.r-graph-gallery.com/38-rcolorbrewers-palettes_files/figure-html/thecode-1.png).

```{r}
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)",
              labels = c("0 - 200", ">200 - 400", ">400 - 600", ">600 - 800", ">800 - 1,000", 
                   ">1,000 - 1,200"), palette = "Blues", contrast = 1) + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))
```



Finally, save your map as an R object and export it.

```{r}
average_cost <- tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)", palette = "Blues", contrast = 1) + 
  tm_shape(london_boundary) + tm_borders(col = "black") +
  tm_scale_bar(position = c(0,0)) +
   tm_layout(title = "Average cost of animal related incidents between 2009 and 2020",  
            frame = FALSE, inner.margins = c(0.1,0.1,0.1,0.15))
```
```{r, eval=FALSE}
tmap_save(average_cost, "output/maps/average_cost_msoa.png", width = 8, height = 5)
```

You can also view your choropleth as an interactive map. It helps to add an `alpha` argument to change your map's transparency.

```{r}
tmap_mode("view")
tm_shape(msoa_lfb) + 
  tm_polygons(col = "cost_per_incident", title = "Cost per Incident (£)", palette = "Blues", contrast = 1, alpha = 0.5) + 
  tm_shape(london_boundary) + tm_borders(col = "black") 
```

# Recommended resources

[Geocomputation with R](https://geocompr.robinlovelace.net/index.html)  

[Simple Features for R](https://r-spatial.github.io/sf/index.html)  

[Spatial Data Science with R](https://www.rspatial.org/)  

[Creating demographic maps in R with tmap packages](http://www.zevross.com/blog/2018/10/02/creating-beautiful-demographic-maps-in-r-with-the-tidycensus-and-tmap-packages/)
